Bayesian inference for a wavefront model of the Neolithisation of Europe 
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We consider a wavefront model for the spread of Neolithic culture across Europe, and use Bayesian 
inference techniques to provide estimates for the parameters within this model, as constrained 
by radiocarbon data from Southern and Western Europe. Our wavefront model allows for both 
an isotropic background spread (incorporating the effects of local geography), and a localized 
anisotropic spread associated with major waterways. We introduce an innovative numerical scheme 
to track the wavefront, and use Gaussian process emulators to further increase the efficiency of our 
model, thereby making Markov chain Monte Carlo methods practical. We allow for uncertainty in 
the fit of our model, and discuss the inferred distribution of the parameter specifying this uncer- 
tainty, along with the distributions of the parameters of our wavefront model. We subsequently 
use predictive distributions, taking account of parameter uncertainty, to identify radiocarbon sites 
which do not agree well with our model. These sites may warrant further archaeological study, or 
motivate refinements to the model. 
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I. INTRODUCTION 

The transition from hunter-gathering to early farming 

— signifying the start of the Neolithic era in traditional 
archaeological terminology — was one of the most impor- 
tant steps made by humanity in developing the complex 
modern societies that exist today. The mechanism of 
the spread throughout Europe of Neolithic farming tech- 
niques, which developed in the Near East around 12,000 
years ago, remains an important and fascinating ques- 
tion. The relative importance of 'cultural' versus 'demic' 
components of the spread — i.e. the transmission of farm- 
ing techniques versus the physical migration of farmers 

— has long been debated in the archaeological literature. 
Intriguingly, advances in genetics mean that quantitative 
assessments of these issues are now becoming possible, 
with many recent studies suggesting at least some mi- 
gration of early farmers (e.g. [l], Q). 

Recently there have been a number of studies using 
population dynamics models to describe the spread of 
Neolithic farmers. Whilst some recent work has focused 
on stochastic methods |3|, most studies have built upon 
the pioneering work of Ammerman & Cavalli-Sforza [Jj, 
and sought the solution of a deterministic partial differ- 
ential equation p-Q- The success of this approach can 
be seen in a number of studies. For example, when us- 
ing values for the model inputs determined theoretically 
or from the archaeological and anthropological literature, 
H, Q found a reasonable agreement between the output 
of the numerical model and the large-scale features of 
the observed first arrival times at Neolithic sites, based 
on the radiocarbon dates of objects found at these sites. 
Unfortunately there may be many other parameter sets 
which provide equally if not better fitting model outputs 
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to the data. We seek these parameters sets by developing 
a rigorous statistical inference method to fit the model 
to the radiocarbon data. 

In this paper we adopt a Bayesian approach to infer- 
ence as this will also allow us to quantify parameter un- 
certainty in a rigorous way. Additionally it allows us to 
quantify correlations between our model parameters, and 
the global uncertainty in the fit of our model to the data. 
The authors are not aware of another study where these 
sophisticated statistical techniques have been used, al- 
lowing the determination not just of parameter estimates, 
but of a plausible range of paranrcter values, given by the 
posterior distribution. 

The 'wave of advance' is one of the most important con- 
cepts in modeling the spread of the Neolithic, underlying 
the studies cited above; since the work of Ammerman & 
Cavalli-Sforza [J] and Clark [l(|, it has been widely ac- 
cepted that the incipient farming spread from the Levant 
to Western Europe in a systematic manner — an outward 
propagating 'wave' — amenable to study by simple de- 
terministic models. The rate of propagation of the wave 
is broadly constant throughout this spread, and various 
authors have estimated the speed of the wave front, U, 
from the radiocarbon data. In the present work, we try 
to produce a statistically reliable estimate of the wave 
speed U from the radiocarbon data. Unlike most if not 
all earlier studies, however, we explicitly account for the 
fact that U is a random variable, due both to the system- 
atic and random errors inherent in the data, and to the 
inherent variability in the underlying population dynam- 
ics processes. (The true spread is not well-modeled on all 
scales, and at all locations, by a wavefront advancing with 
a continuous speed; local variations do of course occur. 
Such local variations are not explicitly modeled within 
simple deterministic wavefront models, which effectively 
average out such small-scale variability, and reproduce 
the spread well on the larger, continental scales.) 

As well as being the natural quantity to describe the 



spread, the wave speed has clear and important impli- 
cations within most mathematical models of the spread. 
For example, within the Fisher-Kolmogorov-Petrovsky- 
Piskunov (FKPP) equation [lj],[l2|, most frequently used 
in models of demic diffusion, the front speed is directly 
linked to the population mobility (diffusivity) and growth 
rate. While the wave front has been most often stud- 
ied within the context of the FKPP equation, many al- 
ternative models of populations dynamics also predict 
waves of advance, with speeds dependent upon their var- 
ious model parameters. For example, within multiple- 
population models related to the FKPP model but allow- 
ing for cultural conversion, the speed can be affected by 
the parameters controlling the conversion [13, [l4| ; simi- 
larly, within other models of cultural transmission, it can 
be linked to the intensity and spatial range of contacts 
|15| . The same is true for many alternative models: e.g. 
models involving alternati ve p arameterizations of growth 
processes and diffusion [5|, |l6[ ; and models involving non- 
Laplacian diffusion (e.g. Levy flights; |17| )■ Even within 
FKPP-likc models with logistic growth, 'time-delay' fac- 
tors attempting to model the generational effects of pop- 
ulation growth more realistically result in a modified re- 
lationship between the front speed and the basic demo- 
graphic parameters [y, 0] • The key quantity which must 
be inferred from the radiocarbon data is the wave speed, 
and so a wavefront-based approach such as that intro- 
duced here, which gives this quantity prominence, is a 
more natural model. 

Edmonson [18[ was the first to estimate the speed of 
the agropastoral transmission in Europe; he gave a value 
of f. 9 km/year. Later, Ammerman & Cavalli-Sforza [4[ 
gave a value of 1 km/year, and most subsequent determi- 
nations of the speed of the Neolithic wavefront have also 
been of order U ~ 1 km/ year [5|, Q ■ There are notable 
regional variations, however. In particular, the Lincar- 
bandkeramik (LBK) culture spread along the Danube- 
Rhine corridor at a higher speed, perhaps as large as 
5 km/year [l{| [2CJ . The spread of the Impressed Ware 
ceramics along the Mediterranean coast has been esti- 
mated to be as fast as 10 km/year jig ], although lower 
estimates are also possible [21(. In contrast, there is no 
clear evidence for any significant acceleration along the 
northern and Atlantic coastline of Europe. In this paper 
we follow |8( in allowing for enhanced anisotropic spread 
along coastlines and along the courses of the Danube and 
Rhine rivers, in an attempt to model the above phenom- 
ena. We perform Bayesian inference for the magnitudes 
of these enhanced effects, and for the magnitude of the 
isotropic background spread; as a result, we are in a po- 
sition to assess the extent to which these features of the 
model are genuinely required by the radiocarbon data. 

Although the posterior distribution of the parameters 
of interest is analytically intractable, computationally 
intensive methods such as Markov chain Monte Carlo 
(MCMC) can be used to generate samples from this dis- 
tribution. The computational cost of simulating the wave 
of advance precludes the direct use of an MCMC scheme. 



We therefore approximate the arrival time of the wave- 
front at each site using a Gaussian process emulator [22[ 
as the computational speed-up makes MCMC methods 
practicable. 

The rest of this paper is organized as follows. We in- 
troduce our wavefront model, and compare its output to 
that obtained from a more traditional approach involv- 
ing partial differential equations (PDEs), in section [TTl In 
section HTT1 we describe the data against which our model 
will be compared. We outline our statistical model, and 
the Bayesian inference scheme used to estimate our model 
parameters from the data, in section IIV1 The results of 
our inference are described in section [Vj and our conclu- 
sions summarized in section I VII Some technical details 
about our statistical methods, including the use of Gaus- 
sian process emulators, arc presented as Appendices. 



II. MODELLING THE PROPAGATING FRONT 



The isotropic FKPP equation, 

dN ( N\ 

— = 7 n(1--)+V-(vVN), (1) 

describes the evolution of population density -ZV(x, t) at 
position x and time i, with the growth rate 7, diffusivity 
v and carrying capacity K as parameters. The solution 
to this equation forms a propagating wave front, which 
travels with a speed, 



U = 2^, 



(2) 



which is dependent on both the diffusivity and the growth 
rate of the population, but importantly is independent of 
the carrying capacity. This result can be readily proved 
in one dimension [12j , and also in two dimensions if the 
curvature of the front is sufficiently small (which is to 
be expected when the distance from the source is large 
compared with the front width) [231 ] . In a spatially het- 
erogeneous environment (such that v and/or 7 vary with 
x), the wavespeed is clearly a function of position, C/(x). 
Numerical solutions to the FKPP partial differential 
equation are most simply obtained by discretizing to a 
grid of points in space, using e.g. finite difference methods 
to approximate the spatial derivatives, and time-stepping 
the solution forward in time. Thus the local population 
density at each point on the grid is calculated at each 
time-step. If we are focusing on the first arrival time of 
the Neolithic farmers at specific radiocarbon sites, how- 
ever, then all modeling of the population behind the front 
is unnecessary; as indeed is the modeling of the equation 
ahead of the front, which solution of the FKPP equation 
also requires. A reasonable alternative is to model only 
the propagating front itself, which we do using a particle- 
based approach. This approach necessarily omits many 
complex demographic effects which may have occurred 
locally within the real spread; but that is also true of 
the FKPP equation most frequently used to model the 



Ncolithisation. And given the limitations of the current 
data, it would be premature to adopt more complex de- 
mographic models for the spread on the continental scale; 
almost all models of the Neolithisation of Europe as a 
whole have therefore similarly focused on the propaga- 
tion of the front. 

We select a starting point for our initial population, 
and at a small radius from this point we approximate a 
circle with a small number of 'particles' (points); these 
particles define our front. An alternative approach is to 
define a regional source, however for simplicity here we 
take a localised source. We keep track of the index of 
the adjacent particles and so can easily define local tan- 
gent and normal vectors, and in particular the local unit 
outward normal n. (Here outward means in the direc- 
tion of the advancing front.) We perform all simulations 
in spherical polar coordinates, at a fixed radius, R, set 
to approximate the Earth's surface (R = 6378 km), and 
define the position in terms of the polar and azimuthal 
angles, <j) and A, respectively; i.e. x = (</>, A). Numeri- 
cally, at each time step, we move the particle i, at po- 
sition Xi = (0,,Aj) and with velocity u^ = J7(xj)iij, a 
small amount in both the <f> and A directions according 
to 



dx, 
d< 



(3) 



Here n^ is the local outward normal, and t represents the 
time in years. As discussed in section HI there are local 
deviations in the rate of spread of the Neolithic, par- 
ticularly along traversable waterways. We follow Q in 
allowing an increased rate of spread along all coastlines, 
and also along the Danube-Rhine river systems. We label 
the enhanced coastal velocity as Vq , and the river veloc- 
ity as Vr. In the partial differential equation approach, 
these velocities appear as additional advectivc terms in 
the FKPP equation (Eq. [IJ , which can be identified with 
anisotropic diffusion [8J]. In the wavefront approach, we 
can instead simply add this effect to the velocity experi- 
enced by particle i, which becomes 



u. = Urn + Vj 



(4) 



A. Spatial dependence 

It is intuitively sensible that the local altitude should 
have a significant effect on the spread of the Neolithic; 
and, indeed, early farming in Europe does not seem to 
have been practical at altitudes greater than 1 km above 
sea level (e.g., there are no LBK sites above this height 
in the Alpine foreland [24j ) . There is also significant evi- 
dence 0,[2a] that the latitude (acting partly as a proxy for 
the climate) had a significant impact on the productivity 
of the land, and therefore the ability to farm. Another 
latitudinal effect noted in the literature is an increased 
competition in the North with the pre-existing Mcsolithic 
population j7j . Both of these factors motivate a decreased 
wavespeed at higher latitudes. 

In order to introduce these spatial dependences into 
the model, we use arrays of geographical altitude 
data taken from the ETOPOl 1-minute Global Relief 
database [26|, taking a dataset with spatial resolution 
of 4 arc- minutes. This forms a 740 by 1100 mesh, with 
approximate longitudinal boundaries of 15° W and 60° E 
and latitudinal boundaries of 25° N and 75° N. 

At points on this grid, we calculate the local velocity 
to reflect the factors outlined above. On land, we require 
the speed to decrease to zero at altitudes above 1 km, 
using a smooth approximation to a step function. To 
allow for limited sea travel we use a different form of cut- 
off at low altitude, setting the speed of the wavefront to 
decrease exponentially with the distance to the nearest 
land (<i c ). We also include a linear dependence of U on 
latitude. As a result, we calculate the local velocity on 
our grid as 



U = U Q 



where the total advection from both river and coastal 
terms at position x, is 

V, = V r csign(ni-Vc,i)Vc,i + F H .sign(ii i -V R ,i)V R ,i. (5) 

Here V c/R are the 'amplitudes' for the river and coastal valucs of v c/R at each point on our mesh, a distance 
advective speeds, and V 



C/R, 



are normalized vectors in 



the direction of the relevant local advection (normalized 
to unit magnitude at points on the river or coast). The 
sign functions ensure that the sense of the advection (e.g. 
upriver or downriver) is that which enhances the outward 
speed of the locally expanding wavefront. We use MCMC 
methods below to infer the acceptable range of the ampli- 
tude parameters Vq and Vr, given the radiocarbon data. 



5 <t> \ \\- itanh{10(a-lkm)}, a > 0, 
4 ~ 100° ) |exp(-d c /10km), a < 0. 

(6) 
Here Uq is the background amplitude determining the 
mean rate of spread; we expect Uq to be of order 
1 km/year, but use our MCMC methods below to infer 
the acceptable range of this parameter (given the radio- 
carbon data) more rigorously, along with the parameters 
Vq and Vr introduced above. The spatial variations in 
U arc illustrated in Fig. [1] (top). 

To deal with the advection terms, the river and coast- 
line vectors used in this study are taken from [27J , which 
contains vector data of the world's coastlines and major 
rivers. We take a subset of this data, which contains 
all the coastlines within our domain, and the river vec- 
tors corresponding to the Danube and Rhine. To obtain 

C/R 

weighted contribution is taken from each of the irregu- 
larly spaced vector data segments which define the river 
in [23| . Specifically, the contribution from each segment 
is weighted by exp(— d vcc /15km), where d vec is the dis- 
tance between the grid point and the river/coastal vec- 
tor (in km). The magnitudes of river/coastal vectors are 
shown in Fig. [T] (bottom). (The small magnitude features 
visible in some inland non-river locations in this plot are 



associated with small lakes, which are included in the 
coastline database. These features have no significant 
effect on our model.) 

It is perhaps worth commenting on our use of present- 
day information about coastline and river locations, since 
these locatinoteons have changed over the timescale of 
the spread we are modelling; most obviously, coastlines 
have changed as a result of sea-level changes arising prin- 
cipally from post-glacial isostatic adjustment. During 
earlier work based on the PDE approach, we investi- 
gated the effect of such changes, using a sea- level model 
supplied by geophysicists from Durham University (e.g. 
[28l l29l ]: Glenn Milne, personal communication). The 
effect on our model was negligible, since the most signif- 
icant changes in sea-level were in the North, and had 
largely occurred by the time that the Neolithic wave 
reaches the northern coasts (so that potentially impor- 
tant land bridges had already disappeared) [30| • We have 
not tried to account for changes in the courses of rivers, 
but we do not expect that such relatively local changes 
would have a significant effect on the large-scale spread 
on which we focus. 

In our wavefront model, to calculate the local speed U 
appropriate at the precise position of particle i (as is 
needed for Eq. 2]), we use bilinear interpolation from the 
values at the four closest mesh-points. Similarly, the local 
advective vectors at the precise position of particle i are 
also obtained using bilinear interpolation from the clos- 
est mesh-points; it is these local vectors which appear in 
Eq. ©. 



B. Wavefront algorithms 

We monitor the separation between the particles, and 
if this becomes larger than some specified value 8, we 
introduce a new particle in order to maintain a roughly 
constant resolution along the front; see Fig. [5] (a) for a 
illustration of this process. Due to the irregular nature of 
the velocity map shown in Fig. [TJ the wavefront can sep- 
arate around low velocity regions (e.g. mountain ranges), 
and subsequently re-merge. We apply algorithms initially 
used to model the evolution of magnetic flux tubes in as- 
trophysical simulations [31( to merge wavefronts in the 
particle model. Every time-step we check the distance 
between each particle and all the other particles in the 
simulation. If the separation of any two particles (who 
arc not neighbors) is less than the resolution length S, 
then we remove the encroaching points and switch the or- 
dering of the loops, so as to merge the fronts; a schematic 
of this process is shown in Fig.[2](b). Typically this pro- 
cess results in a merging of the main front and the cre- 
ation of a small loop behind the main front, which we 
normally remove from the simulation to avoid unneces- 
sary numerical effort. Snapshots from a simulation where 
these small loops are not removed are shown in Fig. [3] 
In the work reported here, we take S to be equal to our 
grid spacing of 4 arc-minutes. 
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FIG. 1: (Color online) (a) Spatial variation of the speed C/(x) 
(from Eq. [6|; (b) magnitude of the distance weighted coastal 
and river vectors, |V;| (as in Eq. [5]). 



C. Comparison between simulations 

We now present qualitative results on the compari- 
son between the wavefront model and the PDE model 
for a typical parameter set. For these (and subsequent) 
calculations, we place the initial source of the farming 
population at 40°N, 35°E, within the Fertile Crescent. 
The starting time for the spread is taken as 6572 years 
cal BC as this date is consistent with Q and the radio- 
carbon data at the site Tell Kashkashok. The solution 
to the FKPP equation, Eq. (|T|), is solved numerically, 
approximating spatial derivatives using a second-order 
finite-difference scheme, and time stepping using an Euler 
scheme. The spatial resolution is 4 arc-seconds, on a 740 
by 1100 mesh. (This is the same mesh introduced above 
to control the spatial variations for our wavefront model; 
the same spatial variations are used for the FKPP equa- 
tion.) The reduced computational effort of the wavefront 
method provides us with a significant speed-up, with a 
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FIG. 2: Schematic of the algorithms: (a) for the insertion of a new point (open circle) in a spreading front; (b) for the removal 
of encroaching points (central circles) in merging fronts. 
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FIG. 3: (Color online) The wavefront model plotted just before (a) and after (b) a merging event. The color gradient plot 
displays the altitude (in meters) for this region (the eastern Mediterranean). Note that the small loop shown behind the merged 
front would normally be removed from the calculation, but has been left in here for illustrative purposes. 



typical simulation taking approximately 10 seconds on a 
single processor with a clock speed 2.67GHz. The cor- 
responding solution to the FKPP equation, with a time- 
step satisfying the Courant-Friedrichs-Lewy (CFL) con- 
dition |32| , requires approximately 24 hours on an eight- 
processor cluster with hyper-threaded Intel Xcon quad- 
core processors of the same clock speed. 



Whilst we do not expect an exact agreement between 
the two models, we do expect their arrival time at a par- 
ticular site to be similar. Indeed, we do find a reasonable 
agreement between the two models, with a maximum dis- 
crepancy in the arrival time of approximately 120 years 
and typical values less than 50 years (for a simulation 
covering of order 5000 years) . Fig. @] shows snapshots at 
four times during the simulations. 



III. RADIOCARBON DATA 

We use a compilation of 302 dates from sites in South- 
ern and Western Europe from [2a|, [33| and [34|]. These 
data contain multiple dates per site and so we determine 
a single date for each site by using a method based on 
[201 ] . The method we use is described in detail in Davi- 
son et al. [35|. Briefly, for sites with at least eight date 
measurements, a x 2 statistical test is used to determine 
the most likely first arrival date from a coeval sub-sample, 
and for sites with fewer measurements, we use a weighted 
mean of these measurements. Fig. [5] shows a plot of the 
radiocarbon sites shaded according to the estimated first 
arrival time, ij, obtained from this statistical treatment. 

In this paper we focus on a statistical model with a 
single error term accounting for mismatch between the 
data and the wavefront; this is described in the next sec- 
tion. Whilst this approach is not entirely satisfactory, 
the additional complication of properly accounting for 
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FIG. 4: A comparison between the numerical solution to the FKPP equation and our propagating wavefront model. The 
output from the wavefront model is shown by the thick black lines. The corresponding output from the FKPP model is shown 
by the grayscale plots of the population density, with the scale given to the right of panel (d); the values are in terms of people 
per square kilometer. The four panels are for times 1000 years (a), 1800 years (b), 3000 years (c), and 4600 years (d) after the 
start of the simulations. In the wavefront simulations C/n = lkm/year, Vc = 2km/year, Vr = 0. A consistent parameter set is 
used in the FKPP simulation. 



varying site errors would add another level of complex- 
ity, which was deemed prohibitive for this initial investi- 
gation. However, a statistical model which accounts for 
errors in the dates that can vary between sites may be 
adopted in future studies. 



IV. BAYESIAN INFERENCE 

We now outline the proposed statistical model and pro- 
vide details of our Baycsian inference scheme. 

Let t(xj|0) denote the time at which the wavefront 
arrives at site i, at position Xj, for i = 1, . . . , n, where n = 
302 is the number of radiocarbon sites in our data set. 
Here 9 = (Uq, Vc, Vr) t is the vector of model parameters 



about which we want to draw inferences. The observed 
arrival time at site i is denoted by tt and these times are 
displayed in Fig. [5] Our statistical model assumes that 
these data are generated by the wavefront model subject 
to (spatially) independent normal errors. Specifically, at 
site i we have 



t, 



t(xj|0) -fere; 



(7) 



where the e, are independent and identically distributed 
standard normal random variables and a is the spatially 
homogeneous standard deviation, allowing for a mis- 
match between the model and the observations (i.e. local 
variations in the arrival of the wavefront, corresponding 
to the expected local deviations from the idealized, global 
model). 
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FIG. 5: The radiocarbon sites we use to constrain our sim- 
ulations. The colorbar shows the estimated first arrival time 
at a site, in time BC, from the radiometric data. 



By adopting a Bayesian approach to the problem of 
inferring the model parameters, we express initial beliefs 
about likely parameter values via a prior distribution, 
denoted ir(0,a). We then construct the posterior dis- 
tribution of our parameters, given the observed arrival 
times. Bayes' theorem gives this posterior distribution 
as 



7t(0,ct|£) oc7r(0,cr)7r(£|0,cr), 



(8) 



where n(t\0,a) is the likelihood function, i.e. the joint 
probability of the observed arrival times, regarded as a 
function of the parameter values. If we assume the model 
outlined in Eq. ([7]) , then we can write the likelihood func- 
tion as 



ir(t\0 7 <j) ex a n exp 



1 1 



T(xi|0)} : 



(9) 



We specify our fairly weak a priori beliefs about by 
adopting independent lognormal distributions for the 
components Uq, Vc and Vr, with modes chosen to match 
previous archaeological estimates d, |2ll [25|]. Specifi- 
cally we take U ~ LN(0.5,0.71 2 ), V c ~ LN(1,0.5 2 ) and 
Vr ~ LN(2.2, 0.8 2 ). We use a weakly informative in- 
verse Gamma prior for the global error parameter, with 
(T 2 -/G(5,10 6 ). 

Due to the complex dependence of the wavefront so- 
lution on the model parameters, the posterior in Eq. (|5J) 
is analytically intractable. Markov chain Monte Carlo 
(MCMC) methods are commonly used, in the context of 
Bayesian inference, to sample intractable posterior dis- 
tributions. These methods aim to construct a Markov 
chain whose invariant distribution is the desired poste- 
rior distribution. Such approaches are particularly useful 
for Bayesian inference since the target distribution need 



only be known up to proportionality. A recent review of 
these methods can be found in J36( . 

In this paper we focus on a Gibbs sampler J37|. This 
particular MCMC scheme can be useful for sampling 
from high dimensional distributions, and requires the 
ability to sample from the full conditional distribution of 
each parameter (or more generally, subsets of parameter 
components). In the absence of analytically tractable full 
conditionals, a Metropolis-Hastings scheme can be used 
for this. Such an approach is often termed Metropolis 
within Gibbs, and its use is outlined in the next section. 



A. Markov chain Monte Carlo algorithm 

In this section we provide details pertinent to our im- 
plementation of the MCMC scheme. A more detailed 
description of the algorithm can be found in Appendix [Al 

We consider a Gibbs sampling strategy where we alter- 
nate between draws of and draws of a 2 (and therefore 
a) from their full conditional distributions. The form of 
the statistical model and its inverse Gamma prior permit 
an analytically tractable full conditional for a 2 . Conse- 
quently realizations of a can be sampled directly. The 
full conditional density of 0, namely n(0\cr, t), however, 
is intractable and we therefore use a Metropolis-Hastings 
scheme to sample from the corresponding distribution. In 
brief, a Markov chain is constructed by generating can- 
didate values of each component of via a symmetric 
random walk with normal innovations on the log-scale: 
this ensures proposed parameter draws are non-negative. 
A proposed value is accepted as the next value in the 
chain with a probability that ensures the Markov chain 
has invariant distribution given by the distribution of in- 
terest. If a proposal is not accepted then the next value 
for that parameter is taken to be its current value. The 
acceptance probability requires that the target density 
can be evaluated up to proportionality. Each MCMC it- 
eration therefore requires a single run of the wavefront 
model expanding across the whole of Europe. 

Unfortunately, the number of MCMC iterations re- 
quired to produce near independent draws from the 
joint posterior distribution precludes using the wavefront 
model to evaluate each t(xj|0). (The individual wave- 
front model calculations run too slowly, given the large 
number of iterations required.) 

To proceed, we seek a faster approximation of the first 
arrival times from the wavefront model. One option is to 
use a deterministic approximation, such as linear inter- 
polation or cubic splines. Initially the wavefront model 
would be run at a specific set of parameter values, and 
the arrival time at each site stored. Parameters within 
the interpolation method could readily be computed from 
this output. The arrival time at new parameter sets 
could then be approximated using the chosen determinis- 
tic 'emulator'. In the statistical literature [38[ Gaussian 
process emulation is favored, as this not only interpo- 
lates smoothly between design points but also quantifies 



levels of uncertainty around interpolated values. Further 
details on how to build and test such emulators can be 
found in Appendix |5] Using these emulators to approxi- 
mate the first arrival times t(xj|0) at each site makes the 
MCMC scheme outlined above computationally practica- 
ble. The scheme produces a sample from the joint pos- 
terior distribution of our model parameters. 



V. RESULTS 

We now present results obtained from the output of 
the MCMC scheme. We performed 5.5 x 10 6 iterations 
of the algorithm before discarding the first 5 x 10 5 pa- 
rameter draws as 'burn in' to allow the chain to converge. 
The remaining 5 x 10 6 iterates were then thinned to re- 
duce the autocorrelation in the sample: we took every 
500 th iterate, leaving a sample of 10 4 (almost) uncorre- 
cted values from the joint posterior distribution. We 
assessed convergence of the MCMC scheme by repeating 
the above procedure for many different starting param- 
eter sets (randomly drawn from the prior distribution) 
and found no problem with convergence. 

The output of the MCMC scheme is summarized in 
Fig. |51 It shows kernel density estimates [39[ of the 
marginal posterior probability density function, for each 
of Uq, Vq, Vr, and a. For the three parameters in our 
mathematical model, the modes of the marginal poste- 
rior distributions (in black in Fig. [5]) are of the mag- 
nitude expected from other studies in the literature (as 
described in section [TJ) . Compared with their respective 
prior distributions (in red), the posterior distributions 
are considerably tighter, showing that the radiocarbon 
data have indeed been informative, and have effectively 
constrained the plausible range of model parameters. For 
example, posterior samples of the background wavespeed, 
Uq (with a mode of approximately 1 km/year, and a 95% 
range of 0.79-1.41 km/year), are in good agreement with 
the studies cited in section U 0, [a, Q • I n terms of an 
FKPP model, with a growth rate of 7 ~ 0.02 year -1 (of 
the order typically used in such models) , this would cor- 
respond to a diffusivity of v ~ 13 km /year; this value is 
also comparable to those typically used in FKPP models. 

The only previous studies which have modeled an en- 
hanced population mobility along waterways [8|, [9J were 
motivated by studies based on specific local phenom- 
ena. For example, the incorporation of enhanced coastal 
speeds within the model was motivated by radiocarbon 
evidence for the spread of the Impressed Ware culture 
along the coastline of the Western Mediterranean, with 
some estimates of speeds of order 10 km/year [lj| l2l| . 
Whilst some form of enhanced spread along the coastline 
of this region may be required, extrapolating to a sim- 
ilar spread along all of Europe's coasts might very well 
give an inferior fit to the data as a whole. (E.g., if we 
take the rate of spread along all of Europe's coastlines 
to be of order 10 km/year, then the wavefront may ar- 
rive at many sites far earlier than the radiocarbon data 



suggest.) The inference presented in this paper seems to 
confirm this, as marginal posterior samples of the global 
coastal propagation speed, Vc (with a mode of around 
0.3km/year, and a 95% range of 0.23-0.41 km/year) are 
markedly lower than the values quoted for the Impressed 
Ware culture. 

Posterior samples of Vc are also significantly lower 
than the corresponding value used in |8[ (20 km/year) . 
It may be that, compared with that earlier work, our 
more rigorous method of comparing models against the 
data suggests an improved model fit without a large en- 
hanced coastal speed, and with regional data variations 
(such as those associated with the Impressed Ware cul- 
ture) largely accounted for within the global error param- 
eter in our statistical model (discussed further below). 
This may not entirely explain the difference between the 
two studies, however; it is also possible that the imple- 
mentation of coastal advection in Q has exaggerated the 
overall magnitude of this effect. The value quoted there 
for the advection actually only applies at points exactly 
on the coast, whereas nearby locations experience a re- 
duced velocity, decreasing with their distance from the 
coast; as a result, the mean, effective, advective speed 
may be somewhat lower than the peak value quoted. 

In terms of the river advection, the marginal poste- 
rior values of Vr are clearly non-negligible (with a mode 
of approximately 1.0 km/year, and a 95% range of 0.72- 
1.38 km/year); it is comparable to the background spread 
modeled by Uq, vindicating the suggestion of anisotropic 
spread along these river basins. However, this value is 
significantly smaller than the values normally quoted for 
the spread of the LBK culture (of order 5 km/year) [8| . 
This deviation may be due to the reasons discussed above 
for the coastal velocity. However, the discrepancy with 
the widely-accepted archaeological timescale for the LBK 
culture (which does not refer to a spatial mathematical 
model, but is obtained directly from a coeval set of ra- 
diocarbon dates) requires further explanation. Looking 
in detail at the data in this region, it may be that the 
earliest dates associated with the LBK culture simply 
do not correspond well to spread by a continuous wave 
of advance (anisotropic or otherwise); rather, early set- 
tlements may have been been formed by something like 
a leapfrog or pioneer mechanism, and the whole region 
only settled (and outward spread continued) after some 
subsequent delay. In this paper we have studied a model 
of large-scale spread, but it may be the model is too 
crude and would need small-scale refinement to explain 
the spread of the LBK culture. 

For the global error parameter in our statistical model 
(Eq. [7]), er, the marginal posterior distribution is centred 
on a value of order 600 years (see Fig. [5]). The 95% confi- 
dence range is 577-671 years. This is significantly larger 
than the values typically quoted for the effective mini- 
mum uncertainty of radiocarbon dates for this period, of 
order 160 years |20fl . (The latter value is derived empir- 
ically from well explored, archacologically homogeneous 
sites, effectively allowing for sample contamination and 



other sources of errors; this may be contrasted with the 
quoted laboratory uncertainties, which characterize only 
the accuracy of the laboratory measurement, regardless 
of the provenance of the sample.) The global parameter 
(a) in our model, however, does not merely reflect the 
uncertainty in the dating, but also allows for the mis- 
match between our mathematical model of the spread (a 
globally continuous wave of advance) and the regional 
variations present in the actual spread. Thus our infer- 
ence suggest that a simple global wave of advance across 
Southern and Western Europe, while remaining a good 
model on the continental scale, should only be consid- 
ered a good model on timescales of order 600 years (and 
thus lengthscales of order 600 km) or greater; on shorter 
timescales (and lengthscales), significant local variations 
should be expected. As noted briefly above (in our dis- 
cussion of the advective velocities) , this parameter within 
our model may to some extent allow for regional varia- 
tions that might alternatively be modeled by specific re- 
gional effects (e.g. river advection), thus explaining the 
relatively low values of our inferred advective speeds. 
The posterior distributions suggest that this type of fit 
— requiring relatively large global uncertainty, but then 
favoring relatively low local advective speeds — is the 
optimum way of explaining our dataset of first arrival 
times within a wave of advance model of the sort pre- 
sented here. Of course, the conclusions of this inference 
may depend upon specific features of the models used 
here (both mathematical and statistical), and may vary 
for different models. In extensions to the current work we 
will explore alternative models, e.g. allowing for increased 
regional variation in the coastal advection, and allowing 
for spatial correlation between nearby sites within the 
statistical model. We will also consider additional, more 
recent, radiocarbon data, which will provide observed ar- 
rival times at new sites, and will thereby allow an out- 
of-sample assessment of prediction error. 

In addition to performing inference for the model pa- 
rameters, we use predictive simulations to assess the va- 
lidity of the statistical model and the underlying model 
of the wave of advance. The posterior predictive dis- 
tribution of the arrival time at a site i, ti_ prc d, can be 
determined as follows. Using Eq. we have that 

t i , P rod|e,a~N(T(x J |0),a 2 ), 

where r(xi|0) is the arrival time of the wavefront, approx- 
imated by the emulator. We therefore take each sampled 
parameter value (9^' ,a^') from the MCMC output and 
generate a realization from the predictive arrival time dis- 
tribution by simulating realizations from t\ p re d|0 ■> a ■ 
We thus obtain a sample of first arrival times at each site. 
Fig. [7] shows the predictive densities for three sites. 

To see where our model is failing to agree with the 
radiocarbon data, sites where the observed radiocarbon 
date falls outside an approximate 95% credible interval 
for predicted first arrival are plotted as filled symbols 
in Fig. [5] sites which fall inside this interval are plotted 
as open circles. We distinguish between sites where our 



model predicts an earlier arrival time than is observed 
in the radiocarbon data, and those where our model ar- 
rives late. Where we predict an earlier arrival time, it is 
quite possible that the radiocarbon data at the site are 
simply from a relatively late settlement within this local 
region, and earlier data there have yet to be discovered. 
Where the model predicts a later first arrival time than 
is observed, then this may be an indication that some lo- 
calized process, which we have not included in our model, 
has caused a much faster spread in this region. 



VI. CONCLUSIONS 

In this paper we have introduced an innovative wave- 
front model, which allows the efficient simulation of the 
spread of a wave of advance model (with both isotropic 
and localized anisotropic components of spread); we have 
applied this model to the spread of Neolithic culture 
across Europe (with the localized anisotropy being asso- 
ciated with a hypothesized enhanced rate of spread along 
certain waterways). We adopted a Bayesian approach to 
the problem of inferring the model parameters given ob- 
served arrival times, which we assumed were given by 
the wavefront model but subject to Gaussian error. A 
Markov chain Monte Carlo scheme was used to sample 
the intractable posterior distribution of the model param- 
eters. To alleviate computational cost, we constructed 
Gaussian process emulators for the arrival time of the 
wavefront at each radiocarbon site. As a result, we ob- 
tain the marginal posterior probability distributions for 
the model parameters of interest: the background rate of 
spread (Uq), and the enhanced rates of spread associated 
with coastlines (Vc) and with the Danube-Rhine river 
systems (Vr). To our knowledge, this is the first attempt 
to apply such inference techniques to this problem. 

We find that the posterior variance is reduced (relative 
to the prior variance) suggesting that the data have been 
informative. Marginal posterior samples of Uq, with a 
modal value of order 1 km/year, are consistent with pre- 
vious studies [5|, Q. Modal values for Vc and Vr are 
of order 0.3 km/year and 1 km/year, respectively. This 
value for the river advection (Vr — 1 km/year) is clearly 
comparable to the speed of the background spread (Uo), 
confirming that an enhanced spread within these river 
basins can be robustly concluded from the data. This 
value is nevertheless significantly smaller than the value 
of 5 km/year often quoted for the rate of spread of the lo- 
cal Neolithic culture (the LBK culture) [13, [2(| ■ A closer 
inspection of the relevant data suggests that the spread of 
this particular culture may not be particularly well mod- 
eled by a continuous wave of advance, and subsequent 
models for this region may wish to pursue other possibil- 
ities; the estimate of 1 km/year given above should sim- 
ply be considered as the best-fitting value within the con- 
straints of the current model. The relatively low modal 
value for the coastal advection (Vc — 0.3 km/year) sug- 
gests that such an advection, while not negligible, should 
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FIG. 6: (Color online) Marginal posterior densities (in black; prior distribution in red) for (a) Uo, (b) Vc, (c) Vr and (d) a, 
based on the (thinned) output of the MCMC scheme, using a Metropolis within Gibbs sampler. 
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FIG. 7: (Color online) Predictive densities for the arrival time at three sites plotted as black lines, with the observed radiocarbon 
dates plotted as (red) asterisks, (a) Site Kremenik, located at 42. 3N, 23.27E, an example of a bad model fit, where our model 
predicts a much earlier arrival time than is presently observed; this point can be seen plotted as a triangle in Fig. [8] The middle 
and right panels are examples of more typical agreement between the model and data; (b) Agrissa Magoula (39.63N, 22.47E), 
(c) Seskto (39.28N, 22.82E). 



not be considered particularly significant throughout Eu- 
rope as a whole. This is perhaps not surprising, given 
that the principal motivation for this effect only applies 
to a specific region of Europe (the Western Mediter- 
ranean coastline, along which the Impressed Ware culture 
spread HHHl). 

In addition to performing inference for the parameters 
characterising the wavefront, we also infer the 'global er- 
ror' a (here formally introduced within our statistical 
model) , representing both uncertainty in the radiocarbon 
dates and the misfit between our simple global wavefront 
model and the true spread (with its regional variations 
and local anomalies). The posterior modal and mean 
values for a are of order of 600 years, significantly larger 
than the uncertainty normally associated with radiocar- 
bon dates for sites of this period (of order 160 years) |20j . 
We therefore argue that this timescale, of order 600 years 
(and consequently also a lengthscale, of order 600 km), is 
the scale at which the spread of the Neolithic in Europe 
can be considered well-modeled by a simple wave of ad- 
vance: at longer timescales and lcngthscales (and clearly 
on the continental scale), such a model of the spread per- 



forms well; at shorter timescales and lcngthscales, signif- 
icant local deviations from such a simple spread must be 
expected. The quantification of this scale is an important 
result from our inference. 

Of course, the conclusions above must depend to some 
extent upon the specific models introduced here (both 
our mathematical wavefront model and the statistical 
model involving a global error parameter). In exten- 
sions to this work, we intend to investigate the robust- 
ness of these results with respect to various changes in 
these models. In our wavefront model, we first intend to 
investigate the possible importance of more regional vari- 
ations within the enhanced spread along waterways. For 
example, we plan to allow different amplitudes of coastal 
enhancements within different regions, thus allowing us 
to explore more effectively the possibility of a regionally 
enhanced spread in the Western Mediterranean, as pro- 
posed for the Impressed Ware culture there. We may also 
allow for advective velocities along other river systems, 
in addition to the Danube and Rhine. 

Implicit in our statistical model is the assumption of 
spatially homogeneous normal errors. This assumption 
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FIG. 8: (Color online) Discrepancies between the predicted first arrival time (modal values of £i, pr ed) and the observed values 
(£;). Filled blue triangles (red squares) show sites where the model predicts anomalously early (late) arrivals. Open circles 
show sites with more acceptable agreement. 



may be unnecessarily restrictive, and alternative statisti- 
cal models (with more complex error structures) should 
be considered. For example, the statistical model may be 
adapted to allow for dates at different sites having dif- 
fering uncertainties; building this into the model might 
result in a more meaningful fit (e.g. avoiding the possibil- 
ity that a single global error parameter, as used here, may 
be unnecessarily smoothing out the fit everywhere) . Fur- 
ther model refinements may also be possible. The wave 
of advance clearly expects that nearby sites will have sim- 
ilar arrival times; in the current model, however, nearby 
sites arc not linked in any way. We will therefore allow 
for spatial correlation between nearby sites, potentially 
helping to smooth out locally anomalous dates, and also 
allowing another estimate of the scales over which the 



radiocarbon data correspond well to a simple wave of 
advance. 

The Ncolithisation of Europe is obviously not the only 
possible application for the methods introduced here, and 
applications to other regions or to other prehistoric peri- 
ods (e.g. the dispersal of palaeolithic cultures) also have 
great potential. Other applications would of course have 
their own difficulties, with one likely challenge being the 
relative scarcity of empirical data in many cases. One 
such case is the spread of Neolithic culture from the Near 
East to South Asia; there are significant gaps in the ra- 
diocarbon record between these regions, and it would be 
of great interest to see how our methods could help to 
model this spread. 
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Appendix A: The Metropolis— Hastings algorithm 

We provide a detailed step-by-step description of the 
MCMC scheme we use to sample from the posterior dis- 
tribution of the model parameters, n(0, o~\t). (This type 
of scheme is well-established within the statistical liter- 
ature [3a . |37| . but is presented here to help the more 
general readership to appreciate the current work.) 

We use a Gibbs sampling strategy, alternating be- 
tween draws of the full conditional distributions n(a\6, t) 
and Ti(0\a,t). Algorithmically, we perform the following 
steps: 

1. Initialise a^ and (o) . Set j = 1. 

2. Draw a^ ~ tt( ■ | 9 u ~ 1] ,t). 

3. Draw 9 {j) ~ tt(- | a^\t). 

4. Set j := j + 1 and go to step 2. 

The resulting Markov chain has invariant distribution 
given by Tr(9,a\t) [36[. The full conditional for a can 
be sampled straightforwardly as, if £ = a~ 2 then 



(\9,t~ Gamma (A, B) , 



(Al) 



where 



.4 



ii 
2' 



D 



i=\ 



{U - r(xi|0)} 2 /2. 



Hence, in step 2 of the Gibbs sampler, &&> is gen- 
erated by first drawing C,^\9^~ 1 ' ,t and then setting 
crO) — l/yJCti). Since the full conditional for 9 is analyt- 
ically intractable we use a Metropolis-Hastings update in 
step 3. Define 

A= (A!,A 2 ,A 3 ) T = (log(C/ ),log(^c),log(F R )) T 

and note that under the prior specification adopted for 
9, each component A,, i = 1, 2, 3 follows a normal distri- 
bution (independently) a priori. In step 3 of the Gibbs 
sampler we propose a new value A* via a symmetric ran- 
dom walk with normal innovations, that is 



A* = A; + LO % , 



N(0,S* 
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where the Si are tuning parameters, the choice of which 
will influence the mixing of the Markov chain. Large 
values of Si will lead to small acceptance probabilities, 
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and the chain will rarely move; whereas small Si will lead 
to many accepted proposed values, but slow exploration 
of the parameter space. We accept the proposed value 
and take A^ = A* with probability a, otherwise we 
take the current value A^" = A^ J ~ '. The acceptance 
probability is given by 



a = min < 1 , 



7r(A*)7r(i|A*,<7 



0')- 



■K{X {j - l) )-K{t\X {j - l \a^)) 



(A2) 



where 7r(A) denotes the prior density ascribed to A and 
7r(i|A, a) is given by Eq. (J9j) with = exp(A). 



Appendix B: Emulation 

The MCMC inference scheme typically requires many 
iterations, with each iteration requiring a full simulation 
of the expanding Neolithic front to evaluate the likeli- 
hood function. As simulations of the front are computa- 
tionally expensive, we emulate the model using Gaussian 
processes (GP) [22|, that is, stochastic approximations 
to the arrival times obtained from the wavefront model. 
These methods are widely used in the computer models 
literature; see, for example, [38[ and references therein. 
In brief, the wavefront model is run for a set of train- 
ing points; the emulator then allows the interpolation 
of the model output between these points. For prag- 
matic reasons, we build an individual emulator for each 
radiocarbon site, rather than attempt to build a complex 
time-space emulator. 

Consider the arrival time t(xj|0) at a single site i. 
For simplicity of notation, we denote this arrival time by 
t(0). Our emulator for the arrival time uses a Gaussian 
process with mean m(-) and covariance function &(-,-), 
that is 



7-(-)~GP(m(.),fc( v ))- 



(Bl) 



We choose a suitable form for the mean function, given 
the approximate relationship expected between the ar- 
rival time at a site and the parameter values, which are 
all speeds. The simple relation d = Ut, where d repre- 
sents distance, t time and U speed, gives t ex 1/U, so we 
choose a mean function which reflects this: 



m{9) = ao + Q'i 



a 2 



1 



1 



a 3 



Vc' 



(B2) 



where the coefficients otk arc determined using least 
squares fits. These coefficients essentially account for the 
relative importance of diffusive, river and coastal spread, 
given the complicated geography between the source of 
the spread and the particular site being emulated. There 
are various possible choices for the form of covariance 
function. We use a stationary Gaussian covariance func- 
tion 



k(9,0) = aexp 




(B3) 



with hypcrparameters a and Tj (j = 1,2,3), which must 
be determined from the training data. 

Suppose that p simulations of the (computationally ex- 
pensive) wavefront model are available to us, each provid- 
ing the arrival time at each radiocarbon-dated site. Let 
t(6) = (t(0i), . . . , t(6 p )) t denote the p- vector of arrival 
times resulting from the wavefront model with input val- 
ues 6 = (<?!,..., 6> P ) T , where 9 % = (U ,i, V R , h V c ,i) T . A 
Gaussian process can be viewed as an infinite collection of 
random variables, any finite number of which are jointly 
normally distributed. Therefore, from Eq. (|B1|) . we have 

r(0)~7V(m(0),#(0,0)), 

where m(0) is the mean vector with jth element m(9j), 
and K(Q, 0) is the variance matrix with (j, £)th element 
k(9 j ,9 i ). 

We can model the front arrival time at the site for 
other values of the input parameters, 9*, as follows. Us- 
ing the standard properties of the multivariate normal 
distribution, the arrival time has distribution 



r(0*)|T(e)~iV (/*(*•)> s(*')) 



(B4) 



where 



H{9*) = m(9*) + K{9*,0) [K(Q, 0)]" 1 [r(0) - m(0)] 
£(0*) = K{9*,9*) - K(9*,G) [K(Q, 0)]" 1 K(@, 9*). 

To simplify the notation, we have dropped the depen- 
dence in these expressions on the hyperparameters a and 
rj (j = 1,2,3). 

1. Fitting the emulator 

We build a separate emulator for each site for which 
we have radiocarbon data. Although a single run of the 
wavefront model for particular input parameters 0; is 
computationally intensive, such a run gives the first ar- 
rival time at all sites, so that only p runs of the wavefront 
model arc needed to construct the training data for all n 
emulators. 

We fit each emulator using a Metropolis-Hasting al- 
gorithm (similar to that described in Appendix [A"|l . to 
obtain the posterior distributions for the hyperparam- 
eters. Fig. [5] (left) shows the traces of the resulting 
hyper-parameter chains (for a single radiocarbon site), 
and Fig. IH1 (right) shows the corresponding posterior den- 
sities. These plots are representative of the MCMC esti- 
mation of the posterior hypcrparametcr distributions at 
other radiocarbon sites. 

Whilst it is possible in theory to fit the emulator and 
the statistical model in Eq. ^ jointly using an MCMC 
scheme, this would be extremely computationally expen- 
sive. We therefore fit the emulator and the statistical 
model separately. In particular, when using the emulator 
output in the inference scheme described in Appendix [Al 
we fix the hypcrparameters at their posterior means. We 
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believe this approach is justified, as even allowing for 
the (low) posterior uncertainty of the hyperparameters 
makes little difference to the (predictive) fit of the emu- 
lators. 

For illustration, Fig. [10] shows the output from one 
emulator, together with the training data, when we fix 
two of the parameters (Vc and Vr) and consider only 
variations in the Uq-bxis. The magnitudes of the er- 
rors shown in the plot are consistent with those from 
the three-parameter emulator. 



2. Selection of training points 

The selection of the training points 9 in parameter 
space merits further comment. Although using a regular 
lattice design is appealingly simple, it is not particularly 
efficient. Instead, we adopt a more commonly used de- 
sign for fitting Gaussian processes, the Latin Hypercube 
Design (LHD) [40| . Designs of this class distribute points 
within a hypercube in parameter space more efficiently 
than a lattice design. If we consider any single param- 
eter direction in isolation, the mean separation between 
points is p~ x , as opposed to p" 1 / 3 for a regular lattice. 

We constructed our 200-point LHD using the Matlab 
routine lhsdesign. Initially we set the lower bounds 
of the hypercube to be the origin and used the up- 
per 1 percentiles of the prior distribution as its upper 
bounds. We then repeatedly ran the inference algorithm 
(described in the following section), used the results to 
determine a conservative estimate of a hypercube con- 
taining all points in the MCMC output (and therefore 
plausibly containing all of the posterior density), and 
generated another LHD. The final LHD used for infer- 
ences on (C/o,Vc,^r) t in this paper is contained within 
the hypercube (0,3.1) x (0,3) x (0,2). 



3. Testing the emulator 

It is imperative that the accuracy of the fitted emu- 
lators as an approximation to the wavefront model be 
assessed. We therefore considered various quantitative 
statistics [41(. We created a second LHD with p* = 100 
points, 9* — (0* , . • • , 01* ) T , and determined the front ar- 
rival time at all sites for each 0* using both the emulator 
mean (with its hyperparameters fixed at their posterior 
means) and the wavefront model: we denote these arrival 
times by r* and r respectively. (Separate values of these 
quantities exist for all sites; but for simplicity, as in the 
preceding sections, the specialization to individual sites 
is left implicit.) 

For brevity, we discuss only the analysis of a statistic 
which includes both site-specific accuracy and correlation 
between residual errors (at the emulator test points) : the 
Mahalanobis distance, MD, defined via 



where 



MD 2 = (t* - t) t V{®*)- 1 {t* - t) 



(B5) 



v(e*) = K{e*,e*) - k(g*, e)K(e, q^kio, e*). 

(Note that V(Q*) is defined analogously to E(<?*) above, 
but now contains information about all p* test points 
in 9*.) It can be shown that MD 2 follows a scaled F- 
distribution [LjJ in the case of the GP emulator, with 
MD 2 ~ p*(p - 5)F p ^ p _ 3 /(p - 3). Figure [IT] shows the 
Mahalanobis distance at each site, together with the up- 
per 95% point of its distribution. This, along with our 
analyses of other statistics (not presented here) , confirms 
that the emulators provide a reasonable fit throughout 
the design space. These diagnostics gave similar results 
for different LHDs, without any systematic site-specific 
biases. 
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FIG. 9: Traces for the hyper-parameters of the emulator for the radiocarbon site Achilleion, located at latitude 39. 2N, longitude 
22.38E (a), and the posterior distributions of the four hyper-parameters from these MCMC chains (b). The output from these 
chains varies from site to site, and it is important to construct a separate emulator for each site, to obtain accurate emulation 
of the wavefront model. 
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FIG. 10: Color online) The mean output from one emulator 
(thick black line), with the training points used also plot- 
ted (red circles). The thin upper and lower lines show ± 
two standard deviations of the emulator output. The (blue) 
squares represent further output from the wavefront model, 
which were not used in constructing the emulator. The dis- 
crepancy between these points and the emulator output can 
be used as a test of the emulator. 
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FIG. 11: (Color online) Test of the fit of the emulators to the 
radiocarbon data: the Mahalanobis distance MDi is plotted 
for each radiocarbon site i. The horizontal (red) dashed line 
(at MDi = 11.44) marks the upper 95% point of the Maha- 
lanobis distance distribution. 



